1 - Spatial Data Science

If you think that spatial data is just a set of longitude and latitude observations for a given area of geographic space, and that it should be treated as such regardless, you’re missing out on opportunities and have a high chances of incurring in meaningless analysis. Spatial data is not, in fact, just a set of records of observable events and/or facts in geographical space. They are the (potential) starting point for answering relevant questions about various problems in a given reality.

In the New Era of Big Data, where decision-making is increasingly data-driven, Spatial Data Science has emerged as an essential field, with applications spanning many sectors of society (e.g. agriculture, environmental science, urban planning, clean energy generation, sanitation, water management, and much more.

For these and many other reasons, I challenge you to reflect on your convictions about Spatial Data, this important entity in the context of Big Data. A lot of things has changed, is changing and will change faster and faster, especially due to the advance of visual and computational technologies. No matter what your area of interest in this field, these changes presuppose, and to some extent demand, that you develop new skills.

1.1 -R for Spatial Data Science

At some point in your life you’ve heard someone say that “R is a powerful tool”. This is true, and you don’t need to make a big effort to find n reasons why it is. WWhen I wrote this notebook (using R Markdown), I didn’t do it thinking about explaining what R is and how it works. The purpose of this notebook goes beyond this part, which I assume you already know. After all, my goal with this notebook is to demonstrate, with one possibility, how you can use R to start your studies in Spatial Data Science.

2 - Hands on

2.1 - Interactive Maps [Point]

In this section, I start using spatial data about the occurrence (point geometries) of land degraded by erosion processes in natural and antropogenic ecosystems in a watershed located in the Brazilian Semiarid Region. Specifically, the data was collected and organized by me, after successive efforts in various field works in the state of Bahia, between the year 2022 and 2024.

Land degradation is understood as the reduction of soil capacity to generate assets and services, in qualitative and quantitative terms, due to the decline of its productive potential and of its environmental regulation capacity” (LAL, 2012). From this perspective, land degradation is a complex process, associated with natural factors and human activities. According to Silva (2023), the land degradation is closely related to landscape degradation, considering the soil-landscape relationship.

On figure below I present a record (21-06-2023/17:06pm), organized by me, of an area highly degraded by an erosive process (Ravina), with a strong impact in landscape.

To reproduce this document, you need to have R installed on your machine, as well RStudio. RStudio is an Integrated Development Environment (IDE) for R.

In the chunk below you will set your Working Directory. To do this you use setwd( ) and pass the Pass the address.

setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")

In the chunk below you will install the necessary packages. As I already have these packages installed, note that I left the command as a comment, adding the #. If you don’t have these packages installed, just remove the # and run. To install a package you use install.packages( ).

The first package to install is leafleat, a visualization tool that is highly recommended for those who want Interactive Maps.

The second package to install is sf (Simple Features) package, a very useful feature for those who work with maps, very interesting for representing and working with spatial vector data including points, polygons, and lines.

#install.packages("leafleat")
#install.packages("sf")

Once installed, in the chunk below you will load the installed packages. All you have to do is use library( ).

library(leaflet)
library(sf)

Here, in the chunk below, you will read the spatial data, in this case, a point geometries (shp.). Note that it will be stored in a variable named shape.

shape <- st_read('D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/LandDegradation.shp')
## Reading layer `LandDegradation' from data source 
##   `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\LandDegradation.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 2146 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -40.69852 ymin: -11.37433 xmax: -40.67941 ymax: -11.34879
## Geodetic CRS:  WGS 84

In the chunk below you can create your first interactive map using the leaflet package. In this situation, your interactive map show the information of land degradation occurrence. Note that, the infomation is presented by cluster. If you zoom in, you’ll see that the clusters are gradually dissolving. This is a very interesting approach when you have a large number of points that can visually pollute the map.

  leaflet(shape) %>%
  setView(lat = -11.351051, lng = -40.684630, zoom = 16) %>% 
  addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery") %>%
  addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>%
  addLayersControl(baseGroups = c("ESRI World Imagery", "Open Street Map"),
  options = layersControlOptions(collapsed = TRUE)) %>%
  addMarkers(clusterOptions = markerClusterOptions(),
             label = shape$Tipo,
             labelOptions = labelOptions(
             textsize = "14px", 
             style = list(
               "font-weight" = "bold", 
             padding = "5px", 
             lat = ~Lat, lng = ~Long)))

In the chunk below you can create your second interactive map using the leaflet package. Note that this time the information is categorized by the Tipo field (Type of Erosion), and is also associated with colors (Yellow, Orange and Red) that reflect the severity of the process.

Tipo <- c("Boçoroca", "Ravina", "Sulcos")

my_palette <- c("red", "orange", "yellow")

#previewColors(colorFactor(my_palette, levels = Tipo), Tipo)

factpal <- colorFactor(my_palette, levels = Tipo)

groups = unique(shape$Tipo)

map = leaflet(shape) %>% 
  setView(lat = -11.351051, lng = -40.684630, zoom = 17.3) %>% 
  addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery")
  for (i in groups) {
    data = shape[shape$Tipo == i, ]
    map = map %>% addCircleMarkers(data = data, radius = 4, 
        weight = 2, opacity = 1, fill = TRUE, fillOpacity = 0.2, color = ~factpal(Tipo), group = i)}
map %>% addLayersControl(overlayGroups = groups, options = layersControlOptions(collapsed = FALSE))

A few background questions:

  • What information can you get from looking at this data?
  • Can this tool be useful in your workplace?
  • What difference does it make if you implement this Interactive Maps feature in your reports?

2.2 - Interactive Maps [Polygon]

In this section, I will use spatial data (Polygon geometries) about the population density in a neighborhood of Formosa City, Goiás, Brazil. Specifically, the raw data describing the number of inhabitants at the census sector scale was collected and made available digitally by the Brazilian Institute of Geography and Statistics (IBGE). It was therefore up to me to acquire, organize and process them (I did it at an previous stage using resources of the GIS/QGIs environment) in order to arrive at the population density.

Population density is very useful information in a variety of situations involving decision-making in cities, but not only E. g., population density information may be relevant for identifying, mapping and monitoring sectors where there is a high risk of COVID-19 infection.

On figure below I present a record (13-02-2024/14:19pm), organized by me, of an street located in the Formosinha neighborhood, in the city of Formosa, Goiás, Brazil. Looking at the context (presence of buildings, arrangement of the houses…), you have indications that this is a sector where there is a considerable population density.

library(leaflet)
library(sf)
library(RColorBrewer)
library(classInt)

Here, in the chunk below, you will read the spatial data, in this case, a polygon geometries (shp.). Note that it will be stored in a variable named Population.

Population <- st_read('D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/PopulationDensity.shp')
## Reading layer `PopulationDensity' from data source 
##   `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\PopulationDensity.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 21 features and 4 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -47.33796 ymin: -15.56236 xmax: -47.32025 ymax: -15.54231
## Geodetic CRS:  WGS 84

If your data is in another SRC, on chunk below you can transform to leaflet projection. In this particular case, this does not apply because the data being read is already in the correct SRC.

#Population <- st_transform(Population, crs = '+proj=longlat +datum=WGS84')

Once this is understood, with the chunk below command you can plot the interactive map.

leaflet() %>%
  addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery") %>%
  addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>%
  addLayersControl(baseGroups = c("ESRI World Imagery", "Open Street Map"),
  options = layersControlOptions(collapsed = TRUE)) %>%
  setView(lng = -47.33, lat = -15.55, zoom = 14) %>%
  addMiniMap(width = 150, height = 150) %>%
  addPolygons(data = Population, color = "black", stroke = 1, opacity = 0.8) 

With the chunk command below, set the parameters to create a map categorized by the population density of the neighborhood.

pal_fun <- colorQuantile("YlOrRd", NULL, n = 5)
p_density <- paste0("<strong>Densid_Ha: </strong>", Population$Densid_Ha)
breaks_qt <- classIntervals(c(min(Population$Densid_Ha) - .00001, Population$Densid_Ha), n = 5, style = "jenks")

Considering the defined parameters, here you can plot and see the interactive map:

leaflet(Population) %>%
  addPolygons(
    stroke = TRUE, 
    color = "black",
    fillColor = ~pal_fun(Densid_Ha),
    fillOpacity = 0.8, smoothFactor = 0.5,
    popup = p_density,
    group = "Population") %>%
  addTiles(group = "OSM") %>%
  addProviderTiles("Esri.WorldImagery", group = "ESRI World Imagery") %>%
  setView(lng = -47.33, lat = -15.55, zoom = 14) %>%
  addLegend("bottomright", 
            colors = brewer.pal(5, "YlOrRd"), 
            labels = paste0("Up to", format(breaks_qt$brks[-1], digits = 2)),
            title = 'Population Density (h/ha)') %>% addLayersControl(baseGroups = c("OSM", "ESRI World Imagery"), 
                   overlayGroups = c("Population"))  

As obvious as it may seem, working with spatial data ALWAYS REMEMBER THAT the data doesn’t explain itself, because it doesn’t speak; the map doesn’t show/indicate/suggest, because the map doesn’t have a life of its own. YOU are expected to do this, and what’s more, YOU are expected to see beyond the data. The limited set of data we are working with is not always enough to answer our questions, it is up to you to know the limits, the power of generalization of the information.

2.3 - Interactive Maps [Raster]

3 - SIDRA-IBGE data

4 - Google Earth Engine

5 - Brazil Data Cube


  1. If you like it and want to contribute: pix: izaiasdesouzasilvaa@gmail.com↩︎